Using Advice in Model-Based Reinforcement Learning

نویسندگان

  • Rodrigo Toro Icarte
  • Toryn Q. Klassen
چکیده

When a human is mastering a new task, they are usually not limited to exploring the environment, but also avail themselves of advice from other people. In this paper, we consider the use of advice expressed in a formal language to guide exploration in a model-based reinforcement learning algorithm. In contrast to constraints, which can eliminate optimal policies if they are not sound, advice is merely a recommendation about how to act that may be of variable quality or incomplete. To provide advice, we use Linear Temporal Logic (LTL), which was originally proposed for the verification of properties of reactive systems. In particular, LTL formulas can be used to provide advice to an agent about how they should behave over time by defining a temporal ordering of state properties that the agent is recommended to avoid or achieve. LTL thus represents an alternative to existing methods for providing advice to a reinforcement learning agent, which explicitly suggest an action or set of actions to use in each state. We also identify how advice can be incorporated into a model-based reinforcement learning algorithm by describing a variant of R-MAX which uses an LTL formula describing advice to guide exploration. This variant is guaranteed to converge to an optimal policy in deterministic settings and to a near-optimal policy in non-deterministic environments, regardless of the quality of the given advice. Experimental results with this version of R-MAX on deterministic grid world MDPs demonstrate the potential for good advice to significantly reduce the number of training steps needed to learn strong policies, while still maintaining robustness in the face of incomplete or misleading advice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...

متن کامل

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another

We present a method for transferring knowledge learned in one task to a related task. Our problem solvers employ reinforcement learning to acquire a model for one task. We then transform that learned model into advice for a new task. A human teacher provides a mapping from the old task to the new task to guide this knowledge transfer. Advice is incorporated into our problem solver using a knowl...

متن کامل

Computational Models for the Combination of Advice and Individual Learning

Decision making often takes place in social environments where other actors influence individuals' decisions. The present article examines how advice affects individual learning. Five social learning models combining advice and individual learning-four based on reinforcement learning and one on Bayesian learning-and one individual learning model are tested against each other. In two experiments...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Using BELBIC based optimal controller for omni-directional threewheel robots model identified by LOLIMOT

In this paper, an intelligent controller is applied to control omni-directional robots motion. First, the dynamics of the three wheel robots, as a nonlinear plant with considerable uncertainties, is identified using an efficient algorithm of training, named LoLiMoT. Then, an intelligent controller based on brain emotional learning algorithm is applied to the identified model. This emotional l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017